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(57) Abstract: Methods and apparatus for implementing an enhanced digital signal processor though the addition of modular com- 
putanon umis which can be operated in parallel are described. In various embodiments the computation units are implemented as 
configurable computation cells which are arranged to form a computation engine which supplements conventional DSP circuitry 
The computation ceUs can be used to perform frequently used DSP functions such a cross-correlation, sorting, FIR filtering quickly 
without the need for extensive iterative processing. By using the computation cells of the present invention in parallel the com- 
putauon of common DSP functions can be performed quickly and resulting in improvements in DSP performance as compared to 
convention DSPs. *^ 
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METHODS AND APPARATUS FOR ENHANCING DIGITAL SIGNAL 
PROCESSORS 

Field of the Invention 

5 

The present invention relates to methods and 
apparatus for performing digital signal processing 
operations and, more specifically, to methods and 
apparatus for enhancing digital signal processors. 

10 

Background 

As technology for digital electronics has 
advanced, digital signal processing using digital 

15 computers and/or customized digital signal processing 

circuits has become ever more important. Applications for 
digital signal processing include audio, video, speech 
processing, communications, system control, and many 
others. One particularly -interesting application for 

20 digital signal processing is the communication of audio 
signals over the Internet. 

The transmission of audio signals over the 
Internet offers the opportunity to communicate voice 
signals, in digital form, anywhere in the world at 
25 relatively little cost. As a result, there has been an 
ever growing- interest in voice transmission over the 
Internet. In fact, Internet telephony is a fast growing 
business area due to is promise of reducing and/or 
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eliminating much of the cost associated with telephone 
calls. In order to support Internet telephony and/or 
other applications which may be required to process 
digital audio and/or video signals, DSPs are frequently 



used 



Thus, DSPs used to process audio signals are 
found in digital telephones, audio add-in cards for 
personal computers, and in a wide variety of other 
devices. In addition to processing of audio signals, a 
single DSP may be called upon to processes a wide range 
of digital data including video data and numeric data. 

Digital audio and/or video files or data 
streams representing sampled audio and/or video images 
can be rather large. In the interests of reducing the 
amount of memory required to store such files and/or the 
amount of bandwidth required to transmit such files, data 
compression is frequently used. In order to determine if 
a specific set of data, e.g., a subset of the data being 
subject to compression, will benefit, from compression 
20 processing, a correlation operation is often performed. 

Data compression is then performed on subsets of the data 
being processed as a function of the output of the 
correlation operation. Accordingly, correlation 
operations are frequently performed when processing audio 
25 data, video data and other types of data. 

As will be discussed in detail below, cross 
correlation generally involves processing two sequences 
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of numbers, each sequence including e.g., N elements, to* 
produce an output sequence which also has N elements, 
where N may be any positive integer. Each element of the 
input and output sequences is normally a number 
5 represented by one or more bits. Cross correlation 

processing generally requires N multiplications and N-1 
additions to produce each of the N output elements. 
Thus, a total of multiples and (N^-N) additions must 
normally be performed to produce an N element cross 
10 correlation output. 

From a cost standpoint, it is desirable to 
avoid building into a DSP a large amount of customized 
circuitry which is likely to be used only infrequently or 
is likely to go unused altogether. In typical DSP 

15 applications, software is normally used to configure 

adders, subtracters, multipliers and registers to perform 
various functions. In some cases, additional specialized 
circuitry may be included in the DSP. For example, some 
DSPs include a relatively small number, e.g., two, 

20 Multiply-and-Accumulate (iyiA.C) processing units. The MAC 
processing units can be used to multiply 2 numbers and 
add the result into a storage register sometimes called 
an accumulator. MAC units may be reused under software 
control . 

2 5 Since the number of MAC units in typical DSPs 

is relatively limited, computationally intensive 
calculations such as, e.g., cross-correlation, normally 
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have to rely on software loops and/or multiple processing 
iterations to be completed. 

In addition to cross-correlation, other 
frequently used DSP functions include sorting, finite 
impulse response filtering, convolution, vector sum, 
vector product, and min/max selection. In many 
applications, such functions generally involve arithmetic 
calculations applied to long sequences of numbers 
representing discrete signals. 

In many applications, the amount of time 
available to process a set of data is limited to real 
world constraints, such as the rate at which digital data 
representing an audio signal will be use to generate 
audio signals that are presented to a listener. Real 
time processing is often used to refer to processing that 
needs to be performed at or near the rate at which data 
is generated or used in real world applications. In the 
case of audio communications systems, such as telephones, 
failure to process audio in or near real time can result 
in noticeable delays, noise, and/or signal loss. 

While the use of iterative loops to perform 
signal processing operations serves to limit the need for 
specialized circuitry in a DSP, it also means that DSPs 
often need to support clock speeds which are much higher 
than would be required if more computationally complex 
operations could be performed without the need for 
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iterative processing operations or with fewer iterative 
processing operations. 

In view of the above discussion, it is apparent 
that there is a need for methods and apparatus which can 
be used to reduce the need for iterative processing 
operations in DSPs. It is desirable from an 
implementation standpoint, that any new circuitry be 
modular in design. It is also desirable that circuitry 
used to implement at least some new methods and apparatus 
be capable of being used to support one or more common 
DSP processing operations. In addition, from a hardware • 
efficiency standpoint, it would be beneficial if at least 
some circuits were easily configurable so that they could 
be used to support multiple DSP processing operations. 

SUMMARY OF THE INVENTTDW 



20 



25 



The present invention is directed to methods 
and apparatus for improving the way in which digital 
signal processors perform a wide variety of common 
operations including cross -correlation, sorting, finite 
impulse response filtering, in addition to other 
operations which use multiply, add, subtract, compare 
and/or store functionality. 

In accordance with various embodiments ' of the 
present invention, digital signal processors and/or other 
programmable circuits are enhanced through the addition 
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Of one or more computation engines.. The computation 
engines of the present invention are of a modular design 
with each computation engine being constructed from a 
plurality of computation cells each of which may be of 
the same design. The computation cells are connected to 
form a sequence of cells capable of performing processing 
operations in parallel. 

In embodiments where the computation results 
are read out of the last computation cell in a sequence ' 
of computation cells, the values resulting from the 
processing of each computation cell can be shifted out of 
the computation engine with the results being passed from 
computation cell to computation cell so that the results 
of multiple cells can be read. 

The computation cells of the present invention 
may be implemented to perform a specific function such as 
cross-correlation, sorting or filtering. Thus, a 
computation engine may be dedicated to supporting a 
particular function such as cross -correlation. 

However, in other embodiments, the computation 
cells are designed to be configurable allowing a 
computation engine to support a wide range of 
. applications. 

One or more multiplexers may be included in each 
25 computation cell to allow re -configuring of the 

computation cell and thus how signals are routed between 
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the computation cell components and which computation 
cell components are used at any given time. 



10 



15 



By reconfiguring the way in which the signals are 
supplied to the internal components of the computation 
cells and the way in which signals are passed between 
computation cell components, multiple signal processing 
operations can be performed using the same computation . 
cell hardware. 

A control value supplied to each computation cell in 
a computation engine can be used to control the 
components of the computation cells and how each of the 
computation cells is configured, in some embodiments, 
e.g., embodiments which support sorting, the 
configuration of a computation cell is also controlled, 
in part, by a cascade control signal generated by a 
preceding computation cell in the sequence of computation 
cells . 

A control register may be included in the 
computation engine for storing the control value used to 
control the configuration of the individual computation 
cells included in the computation engine. The output of 
the control register is supplied to a control value input 
of each of a computation engine -s computation cells. 
25 Thus, the configuration of the computation engine's 

computation cells can be modified by simply writing a new 
control value into the control register. 



20 
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A control value may be several bits e.g., 12 bits, 
in length. In one embodiment, different fields of the 12 
bit control signal are dedicated to controlling different 
elements of the computation cells. For example, 
5 different bits may be dedicated to controlling different 
multiplexers, while another set of bits is dedicated to 
controlling the resetting of values stored in computation 
cell storage devices, while yet another bit is set to 
control whether an adder/subtractor performs addition or 
10 subtraction. 



In accordance with the present invention, a 
software controllable portion of a digital signal 
processor can be used to control the configuration of a 

15 computation engine of the present invention by 

periodically storing an updated control value in the 
computation engine's control register. In addition the 
software controllable portion of the digital signal 
processor can supply data to be processed to one or more 

20 data inputs included in the computation engine and 

receive, e.g., read out, the results of a processing 
operation performed by the computation engine of the 
present invention. 



25 



Both the software controllable digital signal 
processing circuitry and the computation engine of the 
present invention are. in various embodiments, 
implemented on the same semiconductor chip. 
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Because the present invention allows all or 
portions of many processing operations to be performed in 
parallel through the use of multiple computation 
circuits, processing efficiencies can be achieved as 
5 compared to embodiments where software loops are used in 
place of the parallel hardware circuits of the present 
invention. 

Additional features, embodiments and benefits 
10 of the methods and apparatus of the present invention 

will be discussed below in the detailed description which 
follows . 



15 



BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 illustrates an enhanced signal processor 
implemented in accordance with the present invention. 

Fig. 2 illustrates a computation engine 
20 implemented in accordance with one exemplary embodiment 
of the present invention. 

Fig. 3 illustrates a multi-purpose computation 
engine illustrated in accordance with another embodiment 
25 of the present invention. 

Fig. 4 illustrates a cross-correlation 
computation cell of the present invention suitable for 
use in the computation engine illustrated in Fig. 2. 
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Fig. 5 illustrates a sorting cell of the 
present invention suitable for use in the computation 
engine illustrated in Fig. 2. 

Fig. 6 illustrates an FIR filter cell of the 
present invention suitable for use in the computation 
engine illustrated in Fig. 2. 

Fig. 7 illustrates a multi-purpose configurable 
computation cell of the present invention which may be 
used in the computation engines illustrated in Figs. 2 or 
3 . 

Fig. 8 illustrates control logic which may be 
used as the control logic of the multi-purpose 
configurable computation cell illustrated in Fig. 7. 

DETAILKD DESCRIPTION 



The present invention relates to methods and 
apparatus for enhancing digital signal processors. Fig. 
1 illustrates a digital signal processor 100 implemented 
in accordance with the present invention. As illustrated 
the DSP 100 includes first and second programmable 
25 processing circuits 102, 102'. The programmable 

processing circuits 102, 102- include data inputs via 
which they can receive one or more data streams 
representing, e.g., sampled signals. Each data stream 
may correspond to, e.g., one or more physical or virtual 
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voice channels. The programmable processing circuits 
102, 102' process the received signals under control of 
software 104, 104' and in conjunction with computation 
engine 103 which is coupled to the programmable 
processing circuits 102. 104. Data input control circuit 
106, which may be implemented using a multiplexer, 
determines which programmable processing circuitl02, 102 ' 
supplies data to the computation engine 103 at any given 
time. Data input control 106 is responsive to a control 
signal received from programmable processing circuit 102 
to determine which of the two processors 102, 102 ■ will 
supply data to the computation engine at any given time. 

The processing performed by processing circuits 
102, 102- operating in conjunction with the computation 
engine 103 may include various types of voice signal 
processing, e.g., data compression/decompression, 
filtering and/or identification of maximum values such as 
amplitude values, in blocks of data being processed. The 
20 data compression/decompression operation may involve 
performing one or more correlation operations. The 
filtering maybe, e.g., finite impulse response (FIR) 
filtering, which is performed on one or more voice 
signals being processed. The sorting operation may 
25 involve identifying the maximum and/or minimum signal 
amplitude values in a block of data representing such 
values which is being processed. 



15 
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Programmable processors 102, 102' share the 
computation engine as a common resource which can be used 
on a time shared basis. The computation engine 103, in 
accordance with the present invention, receives data and 
5 configuration information from one or both of the 
programmable processors 102, 102'. 

Both the computation engine 103 and processing circuits 
102, 102' may be implemented on the same semiconductor to 
10 form a single chip implementation of the enhanced DSP 
100. . 



15 



Programmable processing circuitry 102, 102' is 
capable of performing operations under software control 
as done in conventional DSP circuits. However, they also 
have the ability to control the configuration of the 
computation engine 103, to send data for processing to 
the computation engine 103 and to receive the results of 
processing operations performed by the computation engine 
20 103. As will be discussed below, the computation engine 
103 includes hardware circuitry which allows parallel 
processing operations to be performed on received data 
thereby facilitating many common operations, e.g., cross- 
correlation, sorting, and FIR filtering to name but a few 
25 examples. 

The processing circuits 102, 102' receives 
digital data, e.g., sampled signals, to be processed. 
When a processing operation needs to be performed for 
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which the computation engine 103 can be used, the 
processing circuits 102. 102' pass the data to be 
processed to the computation engine 103 and then receive 
back from the computation engine the result of the 
5 processing operation. 



The data received back from the computation 
engine 103 may be used in further processing performed by 
the processing circuitry 102, 102'. The circuitry 102, 
102 • outputs processed signals, e.g., digital data, 
produced by the processing it performs. In some cases, 
the output of the computation engine 103 is used directly 
as the processed signal output of the digital signal 
processor 100. 



10 



15 



In accordance with the present invention, the 
programmable processor 102 can configiare the compuation 
engine to perform any one of a variety of supported 
operations. Thus, e.g., the programmable processor 102 

20 may control the computation engine to first perform a 
.. correlation operation, then a filtering operation which' 
may then be followed by another correlation operation or 
even a sorting operation. Virtually any order of 
supported operations is possible- Different operations 

25 may be used for processing different data streams, e.g., 
corresponding to different voice channels or paths 
whether they be physical or virtual. In a similar manner 
processing circuit 102- can control the configuration and 
thus processing of computation engine 103. Thus, the 
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first programmable processing circuit 102 may use the 
computation engine to perform one or more correlation 
operations while the second processing circuit may 
control the computation engine to perform one or more 
5 other processing operations, e.g., FIR filtering or 
sorting . 

While Fig. 1 illustrates two processing 
circuits. 102, 102- it should be understood that 
10 additional processing circuits may share the computation 
engine 103 and/or the computation engine may be pared 
with a single processing circuit 102 . 

Fig. 2 illustrates a first exeiiplary 
15 computation engine 200 which may be used as the 
computation engine illustrated in Fig. l. As 
illustrated, the exemplary computation engine includes a 
plurality of M computation cells 202, 204, 206 which are 
coupled together to form a sequence or cascade of first 
through Mth computation cells. As will be discussed 
below, each of the computation cells 202, 2 04, 206 may be 
implemented using the same or similar circuitry. This 
has the advantage of allowing for a simple and consistent 
method of controlling each of the computation cells 202, 
25 204, 206. It can also help simplify the manufacturing of 
the computation engine 200 by avoiding the need to 
manufacture multiple unique circuits to. implement each 
computation cell. 



20 
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In the Fig. 2 embodiment, the computation cells 
are controlled in unison by bits of a global control 
value. The global control value is loaded into a global 
control register 208 which is coupled to a global control 
value input of each of the computation cells 202, 204, 
20S. In some embodiments, configuration of individual 
cells is further controlled by a cascade control signal 
which is generated from a preceding computation cell when 
a preceding computation cell is present. 

Each computation cell 202, 204, 206 has several 
inputs and at least one output . The inputs include a 
data input, a broadcast input, and an optional cascade 
control signal input. The output of each computational 
cell includes a data signal output and, optionally, a 
cascade control signal output. Each signal input and 
output may correspond to one or more signal lines. 

The data input of the computation engine 200 is 
coupled to the Broadcast input of each of the first 
through m'=^ computation cells 202, 204, 206. In this 
manner, each computation cell is supplied with the input 
data received by the confutation engine 2 00. The data 
input of the computation engine 200 is also coupled to 
the data input of the first computation cell 2 02. 

The data output of each computation cell is 
coupled with .the data input of the next computation cell 
in the sequence of M computation cells. The data output 
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of the last (Mth) computation cell 206 is coupled to the 
data output of the computation engine 200. 

In addition to data inputs and outputs, each 
conrputation cell may include an optional cascade control 
5 input and cascade control output. Since, these signal 
inputs and outputs may be omitted in some embodiments, 
they are shown in Fig. 2 using dashed lines. When 
present, the cascade control input of the first 
computation cell 202 is supplied with' a constant value, 
10 e.g., 0. The cascade control output of each of the first 
through M-1 computation cells, when present, are coupled 
to the cascade control input of the subsequent 
computation cell . ' The cascade control output of the Mth 
computation cell goes unused. 



15 



20 



In various embodiments, the Data Input of the 
computation engine 200 and each computation cell 202, 
204, 206 includes up to three distinct parallel data 
inputs. The data outputs of the computation engine and 
each computation cell normally includes the same number 
of distinct parallel data outputs as inputs. 



The data input and data output of each 
computation cell 202, 204, 206 may be implemented as a 
single, double or triple data path. The three data 
signals which may be received via the data input, in the 
25 case of a triple data path implementation, are DATAl, 
DATA2 and DATA3 . Similarly DATAl, DATA2, and DATA3 
output signals may be generated by a computation cell. 
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For each received data input signal a corresponding data 
output signal is normally generated and supplied to the 
corresponding data input of the next computation cell, in 
the sequence of cells 202, 204, 206. In the case where 
multiple parallel inputs are supported as part of the 
data input to each computation cell, one or more of the 
data inputs may be active at any time depending on the 
particular implementation and processing being performed. 

In a similar manner, the Broadcast input may 
implemented as a single or a double input. In some 
embodiments, a single Broadcast input is used and a 
single broadcast signal is supplied to each of the 
computation cells while in other . embodiments two 
broadcast inputs are used allowing for up to two 
broadcast signals, Broadcastl and Broadcast2 , to be 
received by each computation cell. Each Broadcast signal 
corresponds to a different one of the Data Input signals 
which may be supplied via parallel paths to the 
computation engine. Thus, via the Broadcast input, a 
Broadcastl and/or Broadcast2 signal can be received. The 
Broadcastl input of each computation cell 202, 204, 206, 
when present, is coupled to the DATAl input of the 
computation engine 200 and therefore receives the same 
input signal as the DATAl signal input of the first 
25 computation cell 202. The Broadcast2 signal of each 

computation cell 202, 204, 206, when present, is coupled 
to the DATA2 input, of the computation engine and 
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therefore receives the same input signal as the DATA2 
input signal supplied to the first computation cell 202. 

Figure 3 illustrates a contputation engine 300 
where the DATA INPUT to the computation engine includes 
5 three parallel data inputs, DATAl, DATA2, DATA3 . In 
addition, the data input and output of each computation 
cell 302, 304, 306 corresponds to three parallel data 
inputs and three parallel data outputs labeled DATAl, 
DATA2, DATA3. The data inputs DATAl, DATA2, DATA3 oi the 

10 computation engine 300 are coupled to the corresponding 
data inputs of the first computation cell 302. The data 
outputs of each of the first through M-1 computation 
cells are coupled to the corresponding data inputs of the 
next . computation cell in the sequence of computation 

15 cells 302, 304, 306. 

In addition to being coupled to the DATAl input 
of the first computation cell 302, the DATAl input of the 
computation engine is coupled to a BROADCASTl input of 
each of the computation cells 302, 304, 306. in a 
2 0 similar manner, the DATA2 input of the computation engine 
300 is coupled to a BR0ADCAST2 input of each of the 
computation cells 302, 304, 306. 

A value of zero is supplied to a cascade 
control input of the first computation cell 3 02. The 
25 cascade control output of each of the first through M-1 
computation cells is coupled to the cascade control input 
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of the next confutation cell in the sequence of 
computation cells. • 

The DATAl, DATA2 and cascade control outputs of 
the Mth computation cell 3 06 go unused. The DATA3 output 
5 of the Mth computation cell. 306 is coupled to the data 
output of the computation engine 3 00. 

A global control register 308 is provided for 
storing a control value used to configure and/or reset 
components included in each of the M computation cells 
302, 304, 306. A global control value input of the 
computation engine is coupled to a corresponding input of 
the global control register 308. A global control value 
output of the global control register is coupled to a 
corresponding input of each one of the computation cells 
15 302, 304, 306, 

The computation cells of the present invention 
used in the computation engine 200 or 300 may be 
implemented using a relatively small number of such basic 
elements as a multiplier, an adder, subtract or, 

20 adder/subtractor, and/or a comparator as the arithmetic 
elements. The computation cell normally also includes 
some memory elements, e.g., registers, so that previous 
input signals or the partial results of a long 
computation can be stored. Multiplexers that are 

25 controlled by different fields of the control value 

stored in the global control register and/or the cascaded 
control signal can be used configure the computation 
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cell's elements and to direct various signals or the 
previously computed partial results of a long computation 
to the arithmetic elements for computation. 

The computation cells of a computation engine 
5 200, 300 of the present invention are controlled in 

unison by the value stored in the global control register 
and individually by cascade control signals generated 
internally and/or received from a preceding computation 
cell. Since the global control value output by global 
10 control register controls the configuration of 

computation cells, it is possible to reconfigure . the 
computation cells of a computation engine by simply 
updating the global control value stored in the global 
control register 3 08. 

15 The cascaded control signal, generated in some 

embodiments, by each of the computation cells, is used to 
further refine the functionality within individual 
computation cells. That is, a cascade control output 
(CCO) signal generated by a computation cell, based on 

20 one or more of its input signals, may be used to control 
the next computation cell in the sequence of first 
through Mth computation cells. 

Individual computation cells, M of which may be 
used to implement the computation engine 2 00 or V 
25 computation engine 300, are illustrated in Figs, 4-7. 
The computation cells in Figs 4-6 are well suited for 
performing cross-correlation, sorting, and FIR filtering 
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operations, respectively. Some of the computation cells 
illustrated in Figs. 4-6 do not use all the inputs and 
outputs shovm in the cotr^jutation engine of Fig. 3. 
Accordingly, when a computation engine is constructed 
from computation cells which use fewer inputs and outputs 
than shown in Fig. 3, the signal paths, e.g., lines, and 
unused inputs/outputs may be omitted from the Fig. 3 
computation engine in the interests of implementation 
efficiency. 

The computation cell 400 illustrated in Fig. 4 
is well suited for performing correlation operations. M 
of the Fig. 4 computation cells may be. used to implement 
a computation engine 200, 300 suitable for performing M 
cross correlation operations in parallel . Some 
applications such as, e.g., speech compression, normally 
involve a large fixed number of cross correlation 
operations to be performed on units of data being 
communicated. It is desirable that the computation 
engine 400 include enough computation cells to perform 
the multiply, add, and accumulate computations associated 
with each element of a data sequence corresponding to a 
portion of a voice signal being processed, in one or a 
small number of clock cycles. If it is not possible to 
provide enough computation units to perform the cross- 
correlation processing in a single clock cycle, it is 
desirable that. the number of computation cells be an 
integer divisor of the number of elements in a data 
sequence upon which a cross correlation operation is 
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perfqrmed. Various exemplary numbers of computation 
cells which may be well suited in implementing a 
computation engine 200 or 300 for purposes of cross- 
correlation include 8, 10, 20, 40.' 60. and 240. These 
5 numbers of computation cells are particularly useful in 
voice applications where various voice compression 
standards involve performing correlation operations on 
40, SO, or 240 element sequences. 

The computation cell 400 comprises- a first 
10 storage element 402 labeled Storage 1, an additional 

storage element 414 labeled Storage 3, a multiplier 404, 
summer 408, a first MUX 406 labeled MUX4, and a second 
MUX 410 labeled' MUX 3. 

A first operand, Operandi, is received via a 
15 DATAl input and is supplied to the computation cell's 
STORAGEl storage element 402 and to an A input of 
multiplier 404. A second operand, Operand2, is received 
via a Broadcast2 input of the computation cell 400 and 
supplied to a B input of multiplier 404. Multiplier 404 
operates to multiply Operandi and Operand2 together and 
to supply the result to an II input of MUX4 4 06. A logic 
value of 0 is applied to an 10 input of MUX4 . MUX4 is 
controlled by the signal M4CM which will be discussed in 
detail below. MUX4, under control of the signal M4CM, 
25 operates to connect one of its inputs to its output at 
any given time. The output of MUX4 is coupled to a B 
input of summer 408. 
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A DATA3 input of the corr^jutation cell 400 is 
coupled to an II input of MUX3 410. In this manner, the 
Data3 signal generated by the previous computation cell 
or, if the computation cell 400 is the first computation 
5 cell in a computation engine, an input value of zero. 
MUX3 410 receives at its data input labeled 10 the value 
output by storage element 414 which corresponds to the 
DATA3 output of the computation cell 400. 

MUX3 410 is controlled by control signal M3CM 
10 to connect one of its inputs to its output at any given 
time. The M3CM, like the M4CM control signal discussed, 
elsewhere, is a two bit signal, with each bit of the ' 
signal being identified by a label [o] to indicate the 
lower order bit and [1] to indicate the higher order bit 
15 of the signal. 

- The output of MUX3 410 serves as input A to 
summer 408. The output of summer 408 is coupled to the 
input of STORAGES 414. The output of STORAGES 414 serves 
as the Data3 output of the computation cell 400. 

The contents of STRORAGEl and STORAGES may be 
reset to zero via storage control signals SIR and S3R, 
respectively. These control signals as well as control 
signals M3CM and M4CM are generated by control logic 312 
from the global control value supplied to computation 
cell 400. A circuit which may be used as 'control logic 
312 will be discussed in detail with regard to Fig. 8. 
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In the Fig. 4 embodiment, the control signals 
generated for each of the M computation cells in a 
computation engine 200 or 300 will be the same since they 
are generated from the same global control value. 
Accordingly, the control logic 312 may be placed in the 
computation engine 200 or 300 external to the individual 
computation cells 400. In this manner, a single control 
circuit 312 may be used to control each of the M 
computation cells 400 thereby eliminating the need for 
each of the M cells 400 to include a control logic 
circuit 312. 



Fig. 5 illustrates a sorting computation cell 
500 implemented in accordance with the present invention. 
M computation cells 500 may be used to implement a 
15 computation engine 200 or 300. 

The computation cell 500 includes a first 
multiplexer labeled MUX4 406 a second multiplexer 
labeled MUX3 410', a controllable adder/ subtracter 508, a 
comparator 502, and a storage element labeled STORAGES 

2 0 414. In regard to signal inputs, the sorting computation 
cell 500 includes a Broadcastl input, a Broadcast2 input, 
a Data3 signal input, global control value input and a 
cascade control input. In regard to signal outputs, the 
sorting cell , 500 includes a cascade control signal output 

25 and a Data3 signal output. 

The components of the computation cell 500 are 
coupled together as illustrated in Fig. 5 . In 
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particular, the Broadcastl input is coupled to an 12 
input of MUX4 406'. Another input of MUX4, an 10 input, 
is supplied, with a constant value of zero. The output of 
MUX4 406- is coupled to a B input of a controllable 
5 adder/summer 508. 

The Broadcast2 input is coupled to an A input 
of the comparator 502 and to an 12 input of MUX3 410'. 
The Data3 input is coupled to an II input of MUX3 410' . 
Another input, an 10 input of MUX3 410' is coupled to the 
output of storage element STORAGES 414. The output of 
MUX3 is coupled to an A input of the ASC 508 . The ASC 
508 receives as a control input an ASC control signal 
which corresponds to a pre-selected bit of the global 
control input value. 



10 



15 



The. output of ASC 508 is coupled to the input 
of STORAGES 414. The output. of STORAGES 414 is coupled 
to the DATA3 output of the computation cell 500 in 
addition to a B input of comparator 502. The output of 
comparator 502 is coupled to the cascade control output 
20 of the computation cell 500. 

Operation of the sorting computation cell 500 
will be clear in view of the discussion of sorting 
performed by the multi-purpose computation cell 700 which 
may be configured to operate in generally the same manner 
25 as computation cell 500 for sorting purposes. 

Fig. 6 illustrates an FIR filter computation 
cell 600 which supports programmable filter weights. The 
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computation cell 600 of the present invention includes a 
multiplexer labeled MUXl 602, a controllable adder 608, a 
multiplier 404, and first and second storage elements 
402, 414 labeled Storagel and Storages, respectively. In 
5 regard to signal inputs, the computation cell 600 

includes a Datal signal input, a Broadcast2 signal input, 
a Data3 signal input, and a global control value input. 
In regard to signal outputs, the FIR computation cell 600 
includes DATAl output and a DATA3 signal output. 

^° The components of the computation cell 600 are 

coupled together as illustrated in Fig. 6. In • 
particular, the Datal input is coupled to an II input of 
MUXl 602. Another input of MUXl, an 10 input, is 
supplied with the value output by STORAGEl 402. The 

15 output of MUXl 602 is coupled to an A input of the 
multiplier 404 and to the input of STORAGEl 402. 

The Broadcast2 input is coupled to a B input of 
the multiplier 404. The output of multiplier 404 is 
coupled to a B input of controllable adder/subtractor 
20 508. The DATA3 input is coupled to an A input of the 

adder 608. . The output of the adder 608 is coupled to the 
input of STORAGES 414. The output of STORAGES 414 is 
coupled to the DATA3 output of the computation cell 600. 

The global control value signal input of the 
25 computation cell 600 is coupled to control logic 312'' 
which generates from the global control value control 
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signals used to control MUXl, adder 608 and to reset the 
contents of STORAGEl 402 and STORAGES 414 as necessary. 

In the Fig. 6 embodiment, the control signals 
generated for each of the M computation cells 600 in a 
5 confutation engine 200 or 300 will be the same since they 
are generated from the same global control value. 
Accordingly, the control logic 312'' may be placed in the 
computation engine 200 or 300 external to the individual 
computation cells 600. In this manner, a single control 
10 circuit 312' ' may be used to control each of the M 

computation cells 600 thereby eliminating the need for 
each of the M cells 600 to include a control logic 
circuit 312 ' * . 

Operation of the FIR filter computation cell 
15 600 will be clear in view of the discussion of filtering 
performed by the multi-purpose computation cell 700 which 
may be configured, for FIR filtering purposes, to operate 
in generally the same manner as computation cell 600. 

^^S' 7 illustrates a multi-purpose computation 
cell 700 which can be configured as part of a computation 
engine 200, 3 00 to perform a wide variety of tasks 
including cross correlation, sorting and FIR filtering to 
name but a few. M computation cells 700 may be used to 
25 implement the computation engine 200 or 3 00. In 

particular embodiments M is equal to 8, 10, 20, 40, 60, 
and 240 although other positive numbers for M are 



wo 02/12978 



PCT/USO 1/24667 



-28- 



contemplated and possible. In most cases M is greater 
than 2 . 

In Fig. 7, the computation cell 70 0 comprises 4 
multiplexers (MUXes) labeled MUXl 602, MUX2 704, MUX3 
5 410', MUX4 406- 3 storage elements labeled STORAGEl 
402, ST0RAGE2 706, STORAGES 414, 1 multiplier 404, 1 
adder/subtractor 508, and 1 comparator 708 in addition to 
a control circuit 312»". The various components of the 
computation cell 700 are coupled together as illustrated 
10 in Fig. 7. The control signals to the MUXes have been 
labeled MIC, M2C, M3CM, and M4CM for MUXl, MUX2 , MtJX3 , 
and MUX4 respectively. In addition, the control signal 
for the adder/subtractor has been labeled ASC. The reset 
signals for the STORAGEl, ST0RAGE2 and STORAGES storage 
elements have been labeled SIR, S2R, S3R, respectively. 

In some embodiments, STORAGEl 402 and STORAGE2 
706 are of such a size that they can store the same 
number of bits of binary data while STORAGES 414 is of 
such a size that it can store approximately twice the 
number of bits that STORAGEl 402 can store. The larger 
size of STORAGES 414 is to accommodate the storage of the 
result of a multiplication and addition operation. The 
contents and output of STORAGEl 402, ST0RAGE2 706 and 
STORAGES 414' will be reset to 0 when their respective 
reset signals SIR, S2R, or S3R are set to logic 1. 

Adder/subtractor 508 is controlled by the ASC 
signal which, as will be discussed below, is derived from 
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the global control value output by the global control 
register. In some embodiments, the ASC signal 
corresponds to a selected bit of the global control value 
which may be a multi-bit value, e.g., a 12 bit value. 

When ASC is set to a value of logic 0, the 
adder/ subtradtor performs addition (A + B) of its 2 
inputs. When ASC is set to a value of logic 1, the 
adder/ subtracter performs subtraction (A - B) of its 2 
inputs . 



The comparator 708 performs an arithmetic 
comparison of its 2 inputs and generates a single bit 
logic signal labeled CC. The output CC is logic 1 when 
the CA input is larger than or equal to the CB input (CA 
> CB) . The output CC is logic 0 when the CA input is less 
15 than the CB input (CA < CB) . 

The 4 Muxes. 602, 704, 406", 410' in the 
computation cell are 3 -input, 1 -output MUXes. Thus, for 
each MUX, one of the MUX's 3 inputs will be coupled to 
its output at any time. .Each MUX 602,. 704, 406", 410 ■ 
are responsive to a 2-bit control signal (labeled MC) to 
determine which one of the inputs is coupled to the 
output at a particular point in time. The truth table 
below describes how the control signal supplied to a mux 
causes the mux to direct one of its inputs to its output. 
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MC 


Mux Output 


00 


10 


01 


11 


10 


12 


11 


Don't care 



The global control value which is stored in the 
global control register 308 is used to configure, e.g., 
control the processing of, the computation engine 300 so 
5 it can perform different functions and computations as 
required for a particular application. Thus, the 
computation cells of a computation engine can be 
reconfigured to perform different functions and 
computations by simply loading a new control value. into 
10 the global control register 208 which supplies the global 
control value to each of the individual computation 
cells. 



15 



20 



For a computation engine 300 of the type 
illustrated in Figure 3 implemented using M computation 
cells of the type illustrated in Fig, 7, a 12-bit global 
control value and global control register 308 can be 
used. In accordance with one exemplary embodiment of the 
present invention, the 12 -bit value is divided into 
several bit fields with each bit field performing a 
different control function, e.g., by controlling a 
different circuit in each computation cell . The 
following table describes an exemplary bit field mapping 
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of the global control value and thus global control 
register contents. 
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Bit Number 


11 


10 


9 


8 j 


7:6 


5:4 


3:2 




Field Name 


SIR 


S2R 


S3R 


ASC 1 


MIC 


M2C 


M3C 


M4C j 



10 



15 



20 



Bit fields SIR, S2R, S3R correspond to the like 
named signals which are used to control whether the 
storage elements 1, 2, and 3 in the computation cells are 
reset to 0. .The corresponding register bits can be 
directly connected to the storage element reset signal 
inputs in each of the confutation cells or routed through 
a control logic circuit 312 ■ " which is then responsible 
for coupling the register bit values to the storage 
element reset inputs. When SIR contains a 1, STORAGEl is 
reset to 0. When S2R contains a 1, STORAGE2 is reset to 
0. When S3R contains a 1, STORAGES is reset to 0. 

Global control register bit field ASC is used 
to control whether the adder/ subtracter performs 
additions or subtractions. The bits of the ASC register 
field can be directly connected to the ASC control input 
of the 5 08 included in each computation cell or through 
the control logic circuit 312'". When ASC has a logic 
value of 0, additions are performed by the controlled 
ASCs. When ASC has a logic value of 1, siobtractions are 
performed by the controlled ASCs. 



Global control register bit fields MIC and M2C 
are used to control the muxes Ml and M2 of each 
computation cell. They can be directly connected to the 
25 mux control signal inputs MlC and M2C of MUXl and MUX2, 
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respectively, or coupled thereto via control logic 
312 ' ' ■ . 



10 



Global control register bit fields M3C and M4C 
are used to control the muxes MUX3 410' and MUX4 406", 
respectively. The control of MUX3 and MUX4 also depends 
on the value of the cascade control output (CCO) signal 
generated by the computation cell in which the controlled 
MUX is located. The control is also a function of the 
value of the cascade control signal input to the 
computation cell in which the controlled MUX is located. 



Control logic 312'" is responsible for 
generating the control signals M3CM and M4CM which are 
used to control muxes MUX3 410' and MUX4 406- ' . The 
following table illustrates the value of signals M3CM and 
15 M4CM, based on the indicated input values. 



M3C (or M4C) 


1 M3CM (or M4CM) 


00 


1 


01 




10 


1 * 02 


11 


Depends on Cascade Control Output (CCO) and! 




Cascaded Control Input (CCI) 



Thus, the present invention provides a way to 
locally control MUX3 410- and MUX4 406" of each 
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computation cell based on the cascade control output and 
cascade control input associated with the computation 
cell being controlled. 

The portion of the control circuit 312 • ' • used 
to control MUX3 410' in each computation cell 700 can be 
described by the truth table below. The truth table 
describes how the M3CM control signal can be based on the 
M3C field of the global control value and the locally 
generated cascade control output (CCO) and the cascaded 
control input (CCI) obtained, e.g., from the previous 
computation cell 700 in the sequence of M computation 
cells . 
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The marks in the above truth table denotes 
"don't cares" in digital logic where the 'X' can be 
5 either 0 or l; the output is not affected. 

Similarly, the portion of the control circuit 
312" ' used to control MUX4 406- ■ in each computation 
cell 700 can be described by the truth table below. 



M4C 


CO 


ceil 


M4Cm1 


00 


X 


X 


00 1 


01 


X 


X 


01 i 


-.0 


1^ 


X 


10 j 


11 


0 


0 


1 


11 


0 


1 


jio 


11 


1 


0 


poo j 


11 


1 


1 


1 1 
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A control circuit 8 00 that implements the 
functionality of the above 2 truth- tables and which can 
be used as the control circuit 312'- is illustrated in 
5 Fig. 8. 

As illustrated, the control circuit 800 
includes first. through seventh AND gates 802, 804, 808, 
810, 814, 815, 820, and three OR gates 806,. 812, 818, 820 
arranged as illustrated in Fig. 8. Negated inputs of AND 
gates are illustrated in Fig. 8 using circles at the 
location of the negated AND gate input. 

A global control value input receives the 12 
bit global control value output by global control • 
register 308. The bits of the global control value are 
divided into the individual signals to which they 
correspond and either output or supplied to the logic 
elements of the control circuit 800 as indicated through 
the use of labeling. A pointed connector is used to 
indicate a signal that is supplied to one or more 
correspondingly labeled AND gate inputs . 

Global control value bits [0] and [l] which 
correspond to signals M4C[0] and M4C[1] are supplied to 
AND gates 814, 816 and 820. Prom these signals the AND 
gate 820 generates the signal M4CM[0] which is the lower 
25 bit of the signal M4CM. 



20 
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And gate 816 receives the cascade control 
signals CCO and CCI in addition the signals M4ClO] and 
M4C[1] . The OR gate 818 ORs the output of the AND gates 
814, 816 to generate the higher bit [1] of the M4CM 
5 signal , 

Global control value bits [2] and [3] which 
correspond to signals M3C[0] and M3C[1] are supplied to 
AND gates 8 08, 810. And gate 810 is also supplied with 
the cascade control signals CCO and CCI, The OR gate 812 
10 generates the lower bit [0] of the signal M3CM by DRing 
the outputs of AND gate 808 and 810. 

Global control value bits [2] and [3] which 
correspond to signals M3C[0] and M3C[1] are also supplied 
to AND gates 802, 804. And gate 804 is also supplied 
15 with the cascade control signals CCO and CCI. The OR 

gate 806 generates the higher bit [1] of the signal M3CM 
by ORing the outputs of AND gate 802 and 804. 

The control signals M2C, MIC, ASC, S3R, S2R, 
SIR are generated by the control circuit 810 by simply 
20 splitting out the corresponding bits of the global 

control value and using the appropriate bits as a control 
signal , 

The control circuit 800 is suitable for use as 
the control logic circuit 312 used in the computation 
25 cell illustrated in Fig. 1, Control circuits 312 , 312 » 
and 312 may be implemented by using a control circuit 
which is the same as or similar to the one illustrated in 
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Fig. 8. However, in such embodiments, lanused inputs and 
outputs and the control logic used to generate unused 
outputs may be omitted for purposes of implementation 
efficiency and cost savings. 

^ The multi-purpose computation cell 700 can be 

used to implement a computation engine 300 suitable for a 
wide range of applications, e.g., processing functions. 
Various processing operations as well as the configuring 
of the elements within a computation cell 700 to perform 
10 the processing fiinctions will now be described. 

Autocorrelation Functionality 

Autocorrelation, a special case of cross- 
correlation, is an example of one function which can be 
performed using a computation engine 300 which includes 
15 computation cells 700. 

An autocorrelation sequence for a finite 
sequence of numbers can be described with the following 
equation: 



20 



[nhZx[k]x[k + n]. 



Where x[n] is a finite input sequence of N numbers and 
yxxCn] is the autocorrelation sequence of x[n] . To 
compute the autocorrelation sequence, / 2 
multiplications and (N^ - N) / 2 additions are required. 
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As discussed above, in typical microprocessors 
and DSPs with two or fewer MAC units, a software program 
with an iterative loop construct is required to compute 
this sequence. In the typical microprocessors or DSPs 
which have only 1 or 2 multiply or MAC units, the 
computation of N autocorrelation sequence numbers will 
normally take approximately or more computation cycles 
due to the hardware limitations. 

With the computation engine 20 0 or 3 00 of the 
present invention, each computation cell 700 can be 
configured in the following fashion to compute the 
autocorrelation sequence: 



1) STORAGEl, ST0RAGE2, and STORAGES are initialized to 
15 contain 0 . 

This step can be performed by writing the binary number 
MllOOOOOOOOO" into the global control register 208 or 
308 . 



10 
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2) MUXl selects DATAl input to supply Operandi 

3) MUX2 selects BR0ADCAST2 input to supply Operand2 

4) MUX3 selects DATA3 as one of the inputs to the 
adder/subtractor 508 

5) MUX4 selects the output of the multiplier 404 as the 
other input to the adder/subtractor 508. 
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These steps can be performed by writing the binary number 
, "000000100101" into the global control register 208 or 
308. 

For the entire computation engine 300, the input signals 
5 are configured in the following fashion: 

6) The sequence of x[0] , x[l] , x[2] x[N - 1] is fed 
to the DATAl input, 1 per computation cycle. 

7) The sequence of x[0] , x[l] , x[2],„., x[N - 1] , i per 
computation cycle, is also fed to DATA2 which is 
coupled to the BR0ADCAST2 input of each of the 
computation cells 700. 



10 



15 



After 1 computation cycle, the first 
computation cell 302 would have computed 
x[0]x[O] . 



After 2 computation cycles, 

the first confutation cell 3 02 would have 

computed x[0]x[0] +x[l]x[l], 

the second computation cell 3 04 would have 

computed x[0]x[l] . 

After N computation cycles, the first 
computation cell 302 would have computed: 
x[0]x[0] + xtl]x[l] + ... + x[N - l]x[N - 1] = 
yxx[0] 
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The second computation cell 3 04 would have 
computed: 

x[0]x[l] + x[l]x[2] + ... + xtN - 2]x[lsr - 1] = 

5 

the Nth computation Cell (3 06 assuming N=M) 
would have computed: 
x[0]x[N - 1] = y^[N - 1] 

At this point, the computation engine 30 0 can 
10 be reconfigured (by writing ^^000000000100" into the 
global control register) so that in each of the 
computation cells 700: 

8) MUX3 selects Input 3 as one of the inputs to the 
15 , adder 408 . 

9) MUX4 selects Constant (0) as the other input to the 
adder 4 08. 

The output of the computation engine 300 can be 
used to shift out the autocorrelation sequence y^xEN - l] , 
20 y^[N - 2], yxx[l] , yxx[0] . The number of computation 
cycles it takes to compute this autocorrelation sequence 
is N. An additional N cycles may be used to read out the 
result from the computation engine 300. 

Cross -Correlation Functionality 

2^ computation engine 300, implemented using 

computation cells 700, can also be used to perform cross - 
correlation operations. 
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A cross-correlation sequence for a finite 
sequence of real numbers can be described with the 
following equation: 



Jb=0 



where Xi [n] and Xa [n] are finite input sequence of N 
numbers and yxix2 [n] is the cross-correlation sequence' 
between xi[n] and X2 [n] . Like autocorrelation, it 
normally takes / 2 multiplications and (N^ - n) / 2 
additions to compute a cross-correlation sequence. In 
essence, an autocorrelation sequence is just a special 
case of a cross-correlation sequence. 

With the computation engine 300, each 
computation cell 700 can be configured in the following 
fashion to compute the cross-correlation sequence: 

1) STORAGEl 402, ST0RAGE2 706, and STORAGES 414 are 
initialized to contain the value 0. 



This step can be performed by writing the 
20 binary number ^^111000000000" into the global control 
register 3 08. 



25 



2) MUXl 602' selects the DATAl input to supply Operandi 

3) MUX2 704 selects the BR0ADCAST2 input to supply 
Operand2 
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4) MUX3 410' selects the DATA3 input as the source of 
one of the inputs to the adder/ subtracter 508 

5) MUX4 406- ' selects the output of the multiplier as 
the other input to the adder/siibtractor 508. 

These steps can be performed by writing the 
binary number "000000100101" into the global control 
register 308. 

For the entire computation engine 3 00, at this 
point the input signals would be configured in the 
following fashion: 



6) The sequence of x^tO], Xi[l], Xi[2],..., xx [N - 1] is 
supplied to the computation engine DATAl input, 1 

15 per confutation cycle. 

7) The sequence of Xj [0] , x^ [1] , x^ [2] x^ [N - 1] ig 

supplied, 1 per computation cycle, to the 
computation engine's DATA2 input which is coupled to 
the DATA2 input of the first computation cell and to 
BR0ADCAST2 input of each one of the M computation 
cells. 



After 1 computation cycle. 

The first computation Cell 302 would have 
25 confuted Xi[0]x2[0] . 

After 2 computation cycles, 

The first computation cell 302 would have 
computed Xi[0]x2[0] + Xi[l]x2[l], 
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The second coirrputation cell 304 would have 
computed Xi[0]x2[l] . 
After N computation cycles, 

The first computation cell 302 would have 
5 computed Xi[0]x2[0] + Xi[l]x2[l] + ... + [N - 

1]X2[N - 1] = yxlx2[0] 

The second computation cell 3 04 would have 
computed Xi[0]x2[l] + Xi[l]x2[2] + ... + xi [N - 

2]X2[N - 1] = 7x1x2 [1] 

The Nth computation cell N (306 assuming N=M) 
would have computed Xi[0]x2[N - 1] = yxix2 [N - 1] 



At this point, the computation engine 30 0 can 
be reconfigured, e.g., by writing ^^000000000100" into the 
global control register 308, so that in each of the 
15 computation cells: 

8) MUX3 410 selects the DATA3 input to supply one of 
the inputs to the adder/subtractor 508. 

9) MUX4 406" selects Constant (0) as the other input 
to the adder/ siib tract or 508. 



20 



The output of the computation engine 300 can be 
used to shift out the cross-correlation sequence y^ixa [N - 
111 yxix2[N - 2], yxix2[l]/ yxixsEO], The number of 
computation cycles it takes to compute this cross - 
25 correlation sequence is N. It takes an additional N 

cycles to read out the result from the computation engine 
300 assuming the engine 300 has N computation cells or 



i 
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the. output is taken from the Nth computation cell 700 in 
the sequence of M computatioiv cells. 

Scalabi lity Of Cross-Correlation Functionality 

The computation engine 300 of the present 
5 invention is scalable. A computation engine 200 or 300 
with N computation cells can be used to compute 
correlation sequences shorter or longer than N. 

To compute a cross-correlation of two 
sequences, each sequence including I elements, e.g., 
numbers, where I<N, the computation engine is loaded with 
the sequences of I numbers, the cross-correlation 
sequence is computed, and then the computation results 
stored in the N computation cells are shifted out of the 
computation engine. N-I of the values shifted out of the 
15 computation engine are not used, e.g., they are 

discarded, while the remaining I values representing the 
cross -correlation result are used. In one particular 
embodiment, the first N-I values read out of the 
computation engine are discarded while the remaining I 
20 values are supplied to the processor 102 as the 
c orre 1 at ion re sul t . 



10 



Consider for example the case where a cross- 
correlation result is to be generated from two input 
25 sequences which are longer than N, e.g., each sequence 
having 2N elements. With the computation engine 200, 



wo 02/12978 PCTAJSOl/24667 



•46- 



10 



300, each computation cell 700 can be configured in the 
following fashion to compute the cross-correlation 
sequence of 2N numbers: 

1) STORAGEl 402, ST0RAGE2 706, and STORAGES 414 are 
initialized to contain 0. 

2) MUXl 502' selects the DATAl input to supply Operandi 

3) MUX2 704 selects the BR0ADCAST2 input to supply 
Operand2 

4) MUX3 410- selects the DATA3 input to supply one of 
the inputs to the adder/subtractor 508 

5) MUX4 406- ' selects the output of the multiplier 404 
as the other input to the adder/subtractor 508. 

For the entire confutation engine 300, the 
15 input signals are configured in the following fashion: 

6) The first sequence of Xi [0] , xiil], Xi[2],..., x^ [2N . - 
1] is fed to the computation engine's DATAl input, 1 
per computation cycle. 

7) The second sequence of [0] , x^ [1] , ^^[2],..., x^ [2N - 
1] is fed, 1 per computation cycle, to the 
computation engine's DATA2 input is thus supplied to 
the DATA2 input of the first computation cell 3 02 in 
the sequence of computation cells 302, 306, 306. 



20 



25 



After 2N computation cycles: 
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the first computation cell 3 02 would have 

computed: 

Xi[0]X2[0] + Xi[l]x2[l] + ... + 
Xi[2N - 1]X2[2N - 1] = yxix2[0] 

5 

the second computation cell 304 would have 

computed: 

Xi[0]x[l] + Xi[l]X2[2] + + 

Xi[2N - 2]X2[2N - 1] = y^^[l] 



10 



15 



20 



the Nth computation cell 306 would have 

computed: 

Xi[0]x2[N - 1] + Xx[l]x2[N] + ... + Xa[N]x2[2N - 1] 
= YxixaCN - 1] 

At this point, the conputation engine 3 00 can 
be reconfiguared so that in each of the computation cells 
302, 304, 306: 

8) MUX3 410' selects the DATA3 input to supply one of 
the inputs to the adder/subtractor 508. 

9*) MUX4 406" selects the logic value 0 as the other 
input to the adder/subtractor 508.' 



The output of the computation engine 300 can be 
used to shift out the cross-correlation sequence yxix2 [N - 
1]/ 7x1x2 [N - 2], 7x1x2 [1]/ 7x1x2 [0]. This is half of the 
25 cross-correlation sequence for the 2N input . To complete 
the 2""^ half of the cross -correlation sequence, the 
computation cells are reconfigured as follows: 
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10) The contents of STORAGEl 402, ST0RAGE2 706, and 
STORAGES 414 are cleared so that they contain the 
value 0 . 

11) MUXl 602', MUX2 704, MUX3 410', and ^5UX4 406'' 
5 are configured as in steps 1 to 4 . 

For the entire computation engine 300, the 
input signals are then configured in the -following 
fashion: 

10 12) The first sequence of Xx[0], Xi [1] xi [2] xi [N - 

1] is fed to the DATAl input of the computation 
engine, 1 per computation cycle. 
13) The second sequence of X2 [N] , x2 [N + 1] , xs [N + 
2]/..., X2 [2N - 1] is also fed, 1 per computation 
15 cycle, to 

the computation engine's DATA2 signal input which 
is coupled to the DATA2 input of the first 
computation cell 302 and to the BR0ADCAST2 signal 
input of each one of the M computation cells 302, 
20 304, 306. 

After N computation cycles,. 

The first computation cell 3 02 would have 

25 computed: 

Xi[0]x2[N] + Xi[l]x2[N + 1] + ... + 
Xx[N - 1]X2[2N - 1] = y,i^[N] 
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10 



The second computation cell 2 would have 

computed: 

Xi[0]x2[N + 1] + Xitl]X2[N + 2] + ... + XiCN - 
2]X2[2N - 1] = y^^[N + 1] 

The Nth computation cell (306 assuming N=M) 
would have computed: 

Xi[0]x2[2N - 1] + Xi[l]x2[N] + ... + 
Xi[N]x2[2N - 1] = y^[2N - 1] 

The output of the computation engine 300 can be 
used to shift out the cross -correlation sequence y^^[2N 
- 1]/ yxix2[2N - 2], _., y^i^[N + 1], y^[N]. This is the 
2"^ half of the cross-correlation sequence for the 2N 
input. The total number of computation cycles it takes 
to compute this cross-correlation sequence is 3N assuming 
the computation engine includes N computation cells 
(N=M) . It takes an additional 2N cycles to . read out the 
result from the computation engine 300. 

In general, this computation method can be 
extended to compute the correlation sequence of Y>^N 
numbers. The computations are divided into Y iterations. 
N correlation sequence numbers are computed in each 
iteration. The 1=*= iteration uses YxN computation cycles, 
the 2"^ iteration uses (Y - l) xN cycles, the 3'^ iteration 
25 uses (Y - 2) xN cycles and the final Y^'' iteration uses N . 
cycles, assuming use of a computation engine with N 
computation cells. Therefore, using an N cell computation 



15 



20 
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20 



25 



engine 300, a correlation sequence of Y*N numbers can be 
computed in the following number of computation cycles: 



2 



An additional YXN cycles are used to read out 
the result from the systolic computation engine. 

Sorting Functionality 

The computation engine 300 can also be used to 
10 sort a list of numbers. There are various published 

sorting algorithms available with the ^^fast" ones having 
an execution order 0(Nlog2]SI) , which means that the 
sorting algorithm's computation cycle is proportional to 
NlogsN, where N is the number of entries to be sorted. A 
15 slow algorithm might have an execution order O(N^) . 

The determining factor for a sorting algorithm 
usually has to do with the number of comparisons the 
algorithm must make between the entries in order to 
perform sorting. 

With the computation engine of the present 
invention, N comparisons can be made simultaneously per 
computation cycle assuming the computation engine 300 
includes N computation cells {N=M) . Each computation cell 
302, 304, 306 can compare its content with the current 
entry in the list of numbers being sorted to determine 
the proper location in the final, sorted, list. 



wo 02/12978 



PCT/USOl/24667 



-51- 

To perform such a sorting algorithm, the 
computation engine 300 can be configured in the following- 
fashion: 



10 



15 



20 



25 



1) MUXl 602- selects the BROADCASTl signal input to 
supply Operandi 

2) MUX2 704 selects the Broadcast 2 signal input to 
supply Operand2 

3) STORAGES 414 stores both the entries and its 
associated index in the unsorted list. This can be 
accomplished because STORAGES 414 has approximately 
twice the bit-width as required to store, any entry 
in the unsorted list. STORAGES 414 can be split to 
store the index of the entry on the top half (most 
significant bits) and the entry itself on .the bottom 
half (least significant bits) of the bits. 

4) MUX3 410 > is controlled by the cascade control input 
signal (set to 0 in the case of the first 
computation cell 3 02 and received from the previous 
computation cell for each of the other computation 
cells) and the cascade control output of the current 
computation cell obtained from comparator 708. 

• If the comparator result indicates that Operand2 
is greater than the number portion of the DATA3 
input signal, then MUX3 410' selects the DATA3 
input signal as one input to the adder. 

• If the con^sarator result indicates that Operand2 
is less than the number portion of the DATA3 
input signal AND the cascade control signal from 
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the previous computation cell also indicates so, 
then MUX3 410- selects the DATA3 input signal as 
one input to the adder. 

• If the comparator result indicates that Operand'2 
is less than the number portion of the DATA3 
input AND the cascade control input signal from 
the previous computation cell indicates that 
Operand2 was greater than the number portion of 
the DATA3 input signal in the previous 
computation cell, then MUX3 410' selects Operand2 
(prepended with 0 on the index portion) as one 
input to the adder. 

5) MUX4 406' ' is controlled by the cascaded control 
input signal received from the previous computation 
cell and the comparator result, e.g., the cascade 
control output signal generated by the current 
computation cell: 

• If the comparator result indicates that Operand2 
is greater than the number portion of the DATA3 
input signal, then MUX4 406'' selects Constant 0 
as the other input to the adder 508. 

• If the comparator result indicates that Operand2 
is less than the number portion of DATA3 input 
signal AND the cascaded control input signal 
received from the previous computation cell also 
indicates so, then MUX4 406'' selects Constant 0 
as the other input to the adder 508. 

• If the comparator result indicates that Operand2 
is less than the number portion of the DATA3 
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input signal AND the cascaded control input 
signal received from the previous computation 
cell indicates that Operand2 was greater than the 
number portion of DATA3 input signal in the 
5 previous computation cell, then MUX4 406 

selects Operandi (appended with 0 on the entry 
portion) as the other input to the adder. 

The combination of what MUX3 410' and MUX4 
406' ' select as the input to the adder has the following 
10 effect: 

• -If the comparator result indicates that Operand2 is 
greater than the number portion of the DATA3 input 
signal, then the DATA3 input signal is stored back 
into ST0RAGE3 414. 

15 • If the comparator result indicates that Operand2 is 

less than the number portion of DATA3 input signal 
AND the cascade control signal received from the 
previous computation cell also indicates so, then 
the DATA3 input signal is stored in STORAGES 414. 

20 • If the comparator result indicates that Operand2 is 

less than the number portion of the DATA3 input 
signal AND the cascade control signal received from 
the previous computation cell indicates that 
Operand2 was greater than the number portion of the 

25 DATA3 input signal in the previous computation cell, 

then Operand2 and its associated index is stored 
into STORAGES 414. 
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The above steps can be performed by simply 
writing "OOOOlOlOllll" into the global control register 
308. 

For the entire computation engine 3 00, the 
input signals are configured in the following fashion: 

6) The sequence of 0, l, 2 N - 1 as the index to the 

unsorted list is fed, one computation cycle at a 
time, to the DATAl signal input thereby resulting in 
the signal being supplied to the BROADCASTl input of 
each computation cell in the computation engine 300. 

7) The sequence of x[0], x[l], x[2],..., x[N - 1] as the 
entry to the unsorted list is fed, one computation 
cycle at a time, to the DATA2 input of the 
computation engine 300 thereby resulting in the 
signal being supplied to the BR0ADCAST2 input of 
each of the computation cells in the computation 
engine 300. 

The configuration of the confutation engine 300 
effectively implements an insertion sort algorithm. • - 
After N computation cycles, the systolic computation 
engine can be reconfigured so that in each computation 
cell: 



25 8) MUX3 410- selects the DATA3 input signal as one of 

input to the adder 508. 
9) MUX4 406" selects Constant (0) as the other input 
to the adder 508. 
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The output of the computation engine 300 can be 
used to shift out the sorted sequence of numbers and 
their associated index in the unsorted sequence, from the 
largest to the smallest. The number of computation 
cycles used to coir5)lete the sorting is N. An additional 
N cycles are used to read out the result from the 
computation engine 300. 



10 



FIR Filtering Functional Ity 

With the computation engine 300, the engine's 
computation cells can be configured in the following 
fashion to compute an FIR (finite impulse response) 
filter output sequence: 

15 1) STORAGEl 402 is initialized to contain the filter 

impulse response or the filter coefficients in 
reverse, i.e., the first computation cell 302 will 
have h[N - 1] in STORAGEl 402, the second 
computation cell 304 will have h[N - 2] in its 

20 STORAGEl 402, and so on. Computation Cell N will 

have hfOj in its STORAGEl 402. This will generally 
take N computation cycles to complete the 
configuration, e.g., loading of filter coefficients 
in to the STORAGEl elements of individual 

25 computation cells. 
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2) STORAGES is initialized to contain 0 for each of the 
computation cells 302, 304, 306 in the computation 
engine 3 00, 

3) MUXl 602 selects the DATAl input signal to supply 

Operandi . 

4) MUX2 704 selects BR0ADCAST2 input to supply Operand2 

5) MUX3 410. selects the DATA3 input to provide one of 
the inputs to the adder/ siabtractor 508. 

6) MUX4 406" selects the output of the multiplier 404 
as the other input to the adder/ subtractor 508. 

The computation engine 3 00 can be configured to 
perform step 1 by writing "000001000000" into the global 
control register 308. Step 2 can be accomplished by 
writing "OOIOOOOOOOOO" into the global control register 
308. Steps 3 to 6 can be accomplished by writing 
"000000100101" into the global control register 308. 

7) The sequence of x[0] , x[l], x[2],.„, x[N - 1], and so 
on, is fed 1 per computation cycle, to the DATA2 
input of the computation engine which is coupled to 
the DATA2 input of the first Computation cell 302 
and to the BR0ADCAST2 input of each of the 
computation engine's computation cells 3 02, 3 04, 
306. 

8) The constant 0 is fed to DATA3 input of the 
computation engine 300. 
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The output of the computation engine 3 00 can be 
used to read the filter output sequence y[0] , y[l] , 
y[N - 2], y[N - 1], and so on. 

The computation engine of the present invention 
cal also be used to implement the convolution of 2 
sequences since a convolution can be expressed by the 
same equation as that used to represent the supported FIR 
filter discussed a±}ove. 



10 
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Paralle l Multiply And Accumulate Functionality 

The computation engine 300 implemented using 
computation cells 700 can also be configured to be a 
5 parallel MAC unit capable of performing N multiply-and- 
accumulate operations at once (assuming N=M) by writing 
"OOOOOOOOOOOl" into the global control register 308. In 
such an application, N computation cycles are used to 
shift in the operands, e.g., by writing "110000000000" 
10 into the global control register, and N computation 

cycles are used to shift out the result. The shifting 
out of the result may be achieved by writing 
"001000000000" into the global control register 308. 
Thus, the computation engine 300 of the present invention 
can be used to provide high speed MAC unit functionality 
to a microcontroller, DSP or other digital circuit. 

Additional FunctionalitY 

^° The following table summarizes various 

functions, with their associated global control register 
encoding, that can be performed by a computation engine 
300 which is implemented using multipurpose computation 
cells 700. 



15 
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SIR 


S2R 


S3R 


ASC 


MIC 


M2C 


M3C 


M4C 


No Operations (NOP) 


0 


0 


0 


0 


00 


00 


00 


DO 


Reset Storagel 


1 


0 


0 


0 


00 


00 


00 


00 


Reset Storage2 


0 


1 


0 


0 


00 


00 


00 


00 


Reset Storages 


0 • 


0 


1 


0 


00 


00 


00 


00 


Shift Storagel 


0 


0 


0 


0 


01 


00 


00 


00 


Shift Storage2 


0 


0 


0 


0 


00 


01 


00 


00 


oxiixu ouorage3 


0 


0 


0 


0 


00 


00 


01 


.00 


Compute Correlations 


0 


0 


0 


0 


01 


10 


00 


01 


Compute FIR 


0 


0 


0 


0 


00 


10 


01 


• 01 


Sort 


0 


0 


0 


0 


10 


10 


11 


11 


Parallel Multiply and 
Add 


0 


0 


0 


0 


00 


00 


00 


01 


Parallel Multiply and 
Subtract 


0 


0 


0 


1 


00 


00 


00 


01 



10 



Note that some of the functions can be combined 
to be performed together. For example, functions reset 
storagel, reset storage2, and reset storages can be 
performed together when "111000000000" is written into 
the global control register. Similarly, functions shift 
STORAGEl and shift ST0RAGE2 can be performed together 
when "00001010000" is written into the global control 
register. 



Variations on the above described exemplary 
embodiments will be apparent to those skilled in the art 
in view of the above description of the invention. Such 
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embodiments are considered to be part of the present 
invention. 

For example, the computation engine of the 
present invention may, and in one embodiment does, 
include parallel outputs so that the processing result ' 
generated by each computation cell can be read our in 
parallel thereby avoiding the need to shift out the 
computation result. In addition, the computation engine 
of the present invention can be configured and used to 
perform a wide variety of processing operations in 
addition to those specifically described herein. 
Furthermore, while voice processing applications have 
been described, the computation engine of the present 
15 invention may be used in any number of processing 

applications and is not limited to audio and/or voice 
data processing applications. 



10 
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WHAT IS CIAIMED IS: 



11. A digital signal processor, comprising: 

2 a software programmable prpcessing circuit for 

3 performing signal processing operations under software 

4 control ; and 

a computation engine coupled to said software 
programmable processing circuit for performing a plurality 
of digital signal processing operations including multiply 

8 and add operations in parallel, the computation engine 

9 including : 

a plurality of first through Mth 
computation cells, where M is a positive integer 
^2 greater than 2, each of the M computation cells 

■^^ including a multiplier and an adder circuit. 



5 
6 
7 



10 
11 



2. The digital signal processor of claim 1, wherein the 
first through Mth computation cells are coupled together in 
series, a data input of the first computation cell being 
4 coupled to said programmable processing circuit for 

receiving data to be processed, a data output of the Mth 
computation cell being coupled to said first software 
programmable processing circuit for supplying data thereto. 

13. The digital signal processor of claim 2, wherein each 
2 computation cell further comprises: 
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3 a control value input for receiving a control 

4 value used to control the configuration of circuitry 

5 included in the computation cell. 

1 4, The digital signal processor of claim 3, wherein each 

2 computation cell further comprises: 

3 a comparator; 

4 at least two storage elements; and 

5 means for configuring the connections between the 

6 multiplier, adder, comparator and storage elements as a 

7 function of the control value supplied to the computation 

8 cell. 



1 5. The digital signal processor of claim 4, wherein said 

2 means for configuring includes : 

3 a plurality of multiplexers; and 
control logic circuitry for generating 

multiplexer control signals from said control value, 
6 different bits of said control value being used to generate 
different multiplexer control signals. 



1 6. 

2 



3 



The digital signal processor of claim 4, wherein the 
software programmable processing circuit and the 
computation engine are implemented as a single chip. 



17. The digital signal processor of claim 1, wherein the 

2 software programmable processing circuit and the 

3 computation engine are implemented on the same piece of 

4 semi-conductor material. 



wo 02/12978 



-63- 



PCT/USO 1/24667 



1 8. The digital signal processor of claim 1, further 

2 comprising: 

3 an additional software programcnable processing 

4 circuit for performing signal processing operations \inder 

5 software control coupled to said confutation engine. 

1 9, The digital signal processor of claim 8, further 

2 comprising: 

3 an input selection circuit for controlling the 

4 supply of data from said software programmable processing 

5 circuit and said additional software programmable 
processing circuit to the computation engine. 



6 



3 



1 10. The digital signal processor of claim 9, wherein the 

2 input selection control circuit is responsive to a control 
signal from said software programmable processing circuit 

4 to supply data from a selected one of said software 

5 programmable processing circuit and said additional 

6 software programmable processing circuit to said 

7 computation engine at any given time. 

1 11. A digital signal processor, comprising: 

2 first and second programmable processing 

3 circuits; and 

4 a computation engine coupled to said programmable 
5 

6 



processing circuits, the computation engine including a 
plurality of computation cells arranged to perform signal 



7 processing operations in parallel 
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12. The digital signal processing circuit of claim 11, 
wherein said plurality of corrputation cells include more 
than two computation cells. 

13. The digital signal processing circuit of claim 11, 
further comprising: 

means for time sharing the computation engine 
between said first and second programmable processing units 
on a time shared basis. 

14. The digital signal processing circuit of claim 13, 
wherein the compytation cells are configurable, further 
comprising : 

means for controlling the configuration of the 
computation cells to perform different processing 
operations at different times. 

15 . The digital signal processing circuit of claim ,13 , 
wherein the computation cells are configurable, further 
comprising : 

means for controlling the configuration of the 
computation cells to perform one of a correlation operation 
and a sorting operation. 

16. The digital signal processing circuit of claim 15, 
wherein the means for controlling the configuration 
includes a control register for storing a control value 
used to control the configuration of each computation cell 
in said plurality of computation cells. 
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1 17. The digital signal processing circuit of claim 16, 

2 wherein the plurality of computation cells includes at 

3 least 20 computation cells. 

1 18. The digital signal processing circuit of claim 15, 

2 wherein the plurality of coirputation cells includes at 

3 least 240 computation cells. 



6 
7 



1 19. The digital signal processor of claim 11, wherein the 

2 plurality of computation cells includes first through Mth 

3 computation cells, the first through Mth computation cells 

4 being coupled together in series, a data input of a first 

5 computation cell in the series of M computation cells being 
coupled to said first and second programmable processing 
circuits for receiving data to be processed, a data output 

8 of the Mth computation cell being coupled to said first and 

- 9 second software programmable processing circuits for 

10 supplying data thereto. 

1 20. The digital signal processor of claim 19, wherein a 

2 controllable switch is used to couple the first and second 

3 programmable processing circuits to the data input of the 

4 first computation cell. 



1 
2 



21. The digital signal processing circuit of claim 11, 
wherein each of the computation cells includes at least a 
3 multiplier and one adder. 

1 22. The digital signal processing circuit of claim 21, 

2 wherein each computation cell further includes at least one 
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Storage device for storing the result of a processing 
operation performed by said conputation cell. 

23. A computation engine, comprising: 

a plurality of first through Mth computation cells for 
performing processing operations in parallel, where M is an 
integer greater than two, the first through Mth computation 
cells being coupled together in series, the first 
computation cell in the series of M computation cells 
including a data input of a first computation cell for 
receiving data to be processed by the' series of M 
computation cells, the Mth computation cell including a 
data output for outputting data processed by the series of 
M computation cells, 

each computation cell comprising: 

a subtracter and a multiplier. 

24. The computation engine of claim 23, wherein each 
computation cell further comprises: 

a storage device; and 

means for configuring connections between the 
subtracter, multiplier and storage device. 

25. The .computation engine of claim 24, wherein each 
computation cell further include a comparator; and 

wherein the means for configuring the connections 
between the subtracter, multiplier and storage device is 
responsive to a control signal to configure the computation 
cell to perform part of a sorting operation. 
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1 26. The method of claim 24, wherein the siibtractor in each 

2 of the computation cells is part of a configurable 

3 adder/ subtractor circuit. 

1 27. A method of using a configurable computation engine 

2 including a plurality of M computation cells, where M is an 

3 integer greater than two, to perform a digital signal 

4 processing operation, the method comprising: 

5 configuring the computation cells within the 

6 computation engine to perform correlation processing 

7 operations; 

8 operating the computation engine to perform a 

9 correlation operation; 

reconfiguring the computation cells within the 

11 computation engine to perform filtering processing 

12 operations; and 

operating the computation engine to perform a 

14 filtering operation. 

1 28. The method of claim 27, further comprising the steps 

2 of : 

3 reconfiguring the confutation cells within the 

4 computation engine to perform sorting processing 

5 operations; and 

6 operating the computation engine to perform a 

7 sorting operation. 

1 29. The method of claim 26, further comprising the step 

2 of : 

3 supplying digital audio data corresponding to a 

4 first voice channel to said confutation engine prior to 
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5 performing said correlation operation, said correlation 

6 operation being performed on said supplied digital audio 

7 data. 



1 30. The method of claim 29, wherein said correlation 

2 operation is a cross-correlation operation. 



1 31. The method of claim 29, wherein said correlation 
operation is a auto-correlation operation. 



2 



1 



32. The method of claim 29, further comprising the step 

2 of: 



3 
4 
5 
6 



supplying digital audio data corresponding to a 
second voice channel to said confutation engine prior to 
performing said filtering operation, said filtering 
operation being performed on said supplied digital audio 



7 data , 



1 33. The method of claim 32, wherein said filtering 

2 operation is a finite impulse response filtering operation. 

1 34. The method of claim 27, further comprising the steps 

2 of: 

3 supplying digital data from a first programmable 

4 processor to the computation engine to be used in 

5 performing the correlation operation; and 

6 supplying digital data from a second programmable 

7 processor, to the computation engine to be used in 

8 performing the filtering operation. 
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1 35. The method of claim 27, wherein M is at least 20, the 

2 step of configuring the computation cells within the 

3 computation engine including the step of supplying a 

4 configuration control value to each of the M confutation 

5 cells. 



1 
2 

3 



35. The method of claim 35, wherein the step of supplying 
a configuration control value to each of the M computation 
cells includes the step of supplying the same mult i -bit 
4 control value to • each of the M computation cells. 

1 37. A method of using a configurable computation engine 

2 including a plurality of M computation cells, to perform a 

3 digital signal processing operation, where M is an integer 

4 greater than one, the method con^irising: 

5 configuring the computation cells within the 

6 computation engine to perform sorting processing 

7 operations; 

8 operating the computation engine to perform a 

9 sorting operation; 

reconfiguring the computation cells within the 

11 computation engine to perform filtering processing 

12 operations; and 

operating the computation engine to perform a 

14 filtering operation. 



10 



1 

2 of: 



38. The method of claim 37, further coirprising the steps 



3 supplying digital data from a first programmable 

4 processor to the computation engine to be used in 

5 performing the sorting operation; and 
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supplying digital data from a second programmable 

7 processor to the computation engine to be used in 

8 performing the filtering operation. 

1 39- The method of claim 38, wherein M is at least 8, the 

2 step of configuring the computation cells within the 

3 computation engine including the step of supplying a 

4 configuration control value to each of the M computation 

5 cells. 

1 40. The method of claim 39, wherein the step of supplying 

2 a configuration control value to each of the M computation 

3 cells includes the step of supplying the same multi-bit 

4 control value to each of the M computation cells. 

1 41. A method of using a configurable computation engine 

2 including a plurality of M computation cells, to perform a 

3 digital signal processing operation, wherein M is a 

4 positive integer greater than 1, the method comprising: 

5 configuring the computation cells within the 

6 computation engine to perform correlation processing 

7 operations," 

8 operating the computation engine to perform a 

9 correlation operation; 

reconfiguring the computation cells within the 

11 computation engine to perform sorting processing 

12 operations ; and 

operating the computation engine to perform a 

14 sorting operation. 
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1 42. The method of claim 41, further comprising the steps 

2 of : 

3 supplying digital data from a first programmable 

4 processor to the computation engine to be used in 

5 performing the correlation operation; and 

^ supplying digital data from a second programmable 

7 processor to. the computation engine to be used in 

8 performing the sorting operation. 

1 43. The method of claim 41, wherein M is at least 8, the 

2 step of configuring the computation cells within the 

3 computation engine including the step of supplying a 

4 configuration control value to each of the M computation 

5 cells,. 

44. The method of claim 42, wherein the step of 
supplying a configuration control value to each of the M 
computation cells includes the step of supplying the same 
mult i -bit control value to each of the M computation 
cells. 
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